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Abstract 

Although noisy gene expression is widely accepted, its mechanisms are subjects of debate, stimulated largely by single- 
molecule experiments. This work is concerned with one such study, in which Choi et al., 2008, obtained real-time data and 
distributions of Lac permease in E. coli. They observed small and large protein bursts in strains with and without auxiliary 
operators. They also estimated the size and frequency of these bursts, but these were based on a stochastic model of a 
constitutive promoter. Here, we formulate and solve a stochastic model accounting for the existence of auxiliary operators 
and DNA loops. We find that DNA loop formation is so fast that small bursts are averaged out, making it impossible to 
extract their size and frequency from the data. In contrast, we can extract not only the size and frequency of the large bursts, 
but also the fraction of proteins derived from them. Finally, the proteins follow not the negative binomial distribution, but a 
mixture of two distributions, which reflect the existence of proteins derived from small and large bursts. 
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Introduction 

Data from many independent experiments show that the 
abundance of any given protein varies among individual cells of 
isogenic populations growing under identical conditions [1-3]. 
Early experiments with fluorescent reporters showed that such 
non-uniformity in protein abundance was due to the inherent 
stochasticity of gene expression (intrinsic noise) and various forms 
of cell-to-cell variation (extrinsic noise) [4,5]. The subsequent 
development of single-molecule techniques has led to deeper 
insights into the molecular mechanisms generating the noise [6,7]. 
By measuring the number of mRNAs in single cells, Golding et al. 
showed that transcription was too bursty to be modeled as a 
Poisson process [8]. Cai et al. [9] and Yu et al. [10] developed two 
different methods for measuring the number of proteins in 
single cells. The real-time data of both studies showed that 
protein synthesis was bursty, and the burst size was exponen- 
tially distributed. Under this condition, the steady state 
protein distribution follows the Gamma distribution, 
p n = n"~ l e~"/ h /b a r(a), where a and b denote the mean burst 
frequency and burst size [1 1]. Cai et al. and Yu et al. showed that 
the Gamma distribution could fit their steady state data, and the 
values of the mean burst frequency and size derived from the 
steady state data agreed well with those obtained from real-time 
measurements. 

Armed with these results, Choi et al. [12] attacked a long- 
standing problem. When non-induced cells of E. coli are exposed 



to small concentrations of the gratuitous inducer TMG, the lac 
operon is induced by stochastic switching of individual cells from 
the non-induced to the induced state [13]. Choi et al. sought the 
molecular mechanism of this stochastic switching. To this end, 
they first quantified the minimum number of LacY molecules 
required to switch a cell to the induced state, and found this 
threshold to be 375 molecules. They then suggested a molecular 
mechanism capable of yielding this threshold by appealing to the 
known mechanisms of repression and transcription of the lac 
operon. Repression is mediated by the stable DNA loops formed 
when the Lac repressor is simultaneously bound to the main and 
auxiliary operators (Fig. 1). Transcription can take place either due 
to partial dissociations, which occur when a repressor trapped in a 
DNA loop dissociates from the main operator, but not the 
auxiliary operator; or complete dissociations , which occur when the 
repressor dissociates completely from the DNA. Choi et al. 
hypothesized that since a partially dissociated repressor remains 
attached to the DNA, it rapidly rebinds to the main operator, thus 
limiting the number of transcription events. Although the evidence 
suggests that no more than one mRNA is made during a partial 
dissociation, it is conceivable that multiple transcripts are made 
during a partial dissociation despite its short lifetime, thus leading 
to a small transcriptional burst. In contrast, a completely 
dissociated repressor takes a relatively long time to find an 
operator, which results in a large transcriptional burst. These large 
transcriptional bursts can provide enough proteins to cross the 
threshold for stochastic switching. 
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Figure 1. Structure and states of the lac operon. The repressor R can bind to any of the three operators, namely the main operator 0\, and the 
two auxiliary operators O2, O3. The repressor-free state is enclosed by the lower dashed box. The repressor-bound states, enclosed by the upper 
dashed box, consist of the following 5 states (clockwise from the left): the O3 -bound state OyR, the looped state OyRO\, the 0\ -bound state 
0\ R, the looped state, 0\ R Oi, and the Oi-bound state Oi R. Transcription occurs only if the operon is in the repressor-free state or the repressor- 
bound state OrR. Small bursts occur whenever the repressor dissociates from the looped state 0\RC>2 to form the O2 -bound state OrR. Large 
bursts occur whenever the repressor dissociates from the DNA to form the repressor-free state. Transitions between repressor-free and repressor- 
bound states occur with propensities ko and k\. 
doi:10.1371/journal.pone.0102580.g001 



Choi et al. tested the foregoing hypotheses as follows. The 
statistics of small transcriptional bursts were obtained with strain 
SX701, a lacY~ strain that exhibits mostly small bursts. To 
capture the statistics of large bursts, they deleted the auxiliary 
operators of their lacY~ cells, thus creating strain SX703 which 
yields only large bursts. The statistics of the small and large bursts 
were quantified by measuring the steady-state protein distributions 
for both strains at various inducer concentrations. They then 
concluded, based on the model of Friedman et al. [1 1], that if \i,a 2 
denote the mean and variance of a protein distribution obtained 
with strain SX701, then the Fano factor, F = a 2 /n, and the 
reciprocal of the noise, t]~ 2 = {ji/a) 2 , represent the size and 
frequency of the small bursts. Likewise, if p,,a denote the mean 
and variance for SX703, then F = a 2 /Jl, r\~ 2 = (]l/a) 2 represent 
the size and frequency of the large bursts. Analysis of the data for 
SX703 with this method showed that rj~ 2 did not change with 
inducer levels, but F increased dramatically (Fig. 2a), thus 
confirming their hypothesis that large bursts can generate enough 
proteins to trigger stochastic switching. Surprisingly, analysis of the 
data for SX701 also yielded similar trends (Fig. 2b), but this was 
attributed to the distortions created by the few cells exhibiting 
large bursts. Indeed, if the data were filtered by removing the 



contribution of large bursts, r\ ~ 2 and F did not change much with 
the inducer concentration (Fig. 2c), leading the authors to 
conclude that the small burst frequency and size were independent 
of the inducer level. 

Choi et al. also explained these results by appealing to the 
known states of the lac operon (Fig. 1). However, the mathematical 
model of Friedman et al., which forms the basis of their data 
analysis, does not account for these complexities — it only 
considers a constitutive (unregulated) promoter. Consequently, 
there is no strong support for the assumption that the proteins 
follow the Gamma distribution; F,F represent the size of small 
and large bursts; and r\~ 2 ,fj~ 2 represent the frequency of small 
and large bursts. The goal of this study is to verify the validity of 
these assumptions by formulating a stochastic model accounting 
for the known states of the operon, and deriving analytical 
expressions for the steady state protein distribution, Fano factor, 
and noise. 

There are stochastic models accounting for the details shown in 
Fig. 1 [14—16], but these studies do not give analytical expressions 
for the steady state protein distribution. The literature also 
contains several stochastic models of gene regulation for which 
analytical solutions were obtained [11,17-24], but they do not 
account for the presence of multiple auxiliary operators and DNA 
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Figure 2. The variation of the Fano factor and the reciprocal of the noise with the inducer level [12]. (a) Derived from data for strain 
SX703, which exhibits only large transcriptional bursts, since it lacks both auxiliary operators. Choi et al. proposed that F and l/fj 2 represent the size 
and frequency of large transcriptional bursts, (b) Derived from raw data for strain SX701, which exhibits mostly small transcriptional bursts, since it 
has both auxiliary operators. Choi et al. did not consider this data on the grounds that the occurrence of large bursts in a few cells distorted the 
statistics of the small transcriptional bursts, (c) Derived from data for strain SX701 that was filtered by rejecting the data corresponding to the few 
cells exhibiting large bursts. Choi et al. proposed that this F and l/>j 2 represent the size and frequency of small transcriptional bursts, (d) Mean size of 
large transcriptional bursts in strain SX701, b c b, (full red curve) and fraction of proteins derived from such bursts,/., (full blue curve) estimated from 
the data in (b). The ordinate of the dashed red line is one-third of the ordinate of the F vs. [TMG] line shown in (a), and therefore represents one-third 
of the (large) transcriptional burst size in strain SX703. The proximity of the full and dashed red lines implies that the mean size of large transcriptional 
bursts in strain SX701 is approximately one-third of the transcriptional burst size in strain SX703, which is consistent with our model predictions. 
doi:1 0.1 371 /journal.pone.01 02580.g002 



looping. Our model fills this gap in the theoretical literature, and 
its analysis yields deeper insights into the experimental data. 
Specifically, we show that the size and frequency of small bursts 
cannot be extracted from the data for strain SX701 because they 
are averaged out. However, we can extract not only the size and 
frequency of the large bursts, but also their contribution to total 
protein synthesis, provided the data is not filtered (Fig. 2d). This 
result also yields tests for the consistency of the model by providing 
relationships between the size and frequency of large bursts in 
strains SX701 and SX703. Finally, we show that neither one of the 
two strains follow the negative binomial (or Gamma) distribution. 

The paper is organized as follows. In the Analysis section, we 
describe the model, derive the master equation, and explain the 
key approximations used to obtain the steady state protein 
distribution. In the Results section, we perform simulations to 
check the validity of the analytical expression for the protein 
distribution, and we derive the expressions for mean and the 
variance of the distribution. We also show that the mean, variance, 
and hence, the Fano factor and the reciprocal of the noise, can be 
expressed in terms of the size and frequency of the transcriptional 
and translational bursts. In the Discussion section, the latter are 
compared with the assumptions of Choi et al. We also show that 



negative binomial distributions are obtained only if the size of the 
large transcriptional bursts is relatively small. 

Analysis 

The model scheme, shown in Figure 1 , is based on the following 
facts enunciated by Oehler et al. [25,26] . The lac operon of E. coli 
contains three operators, namely the main operator 0\, and the 
two auxiliary operators 02,03, lying downstream and upstream of 
0\ . The lac operon rarely entertains more than one Lac repressor, 
and this single repressor R can bind to any one of the operators, 
thus forming the operon states, 0\ R, O2R, and OyR. Since the 
tetrameric repressor is a "dimer of dimers," it has a free dimer 
even after it is bound to one of the operators. This free dimer can 
bind to one of the remaining two free operators, thus forming a 
DNA loop. In principle, three looped states are feasible, namely, 
0\ R O2, 0\ R O3, and O2 R O3, but the last one is very unlikely 
to form. We are therefore led to consider only six feasible states of 
the operon — the repressor-free state, and the five repressor- 
bound states, 0\ R, 0 2 R, OyR, O v R0 2 , and OyROy Only 
three of these six states permit transcriptional activity, namely, the 
repressor-free state and the repressor-bound states, O2 R and 
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OyR. The first two states permit full transcriptional activity. The 
last state can be neglected since it permits only 3-5% of the full 
transcriptional activity. 

The model kinetics are based on the following assumptions. All 
cells have the same number of repressors, N, which is tantamount 
to neglecting extrinsic noise [4]. Since association of a cytosolic 
repressor to an operator is diffusion-limited, we assume that a 
cytosolic repressor has the same propensity, k a N, for association 
with each of the operators. In contrast, the propensity for 
dissociation of operator-bound repressor does depend on the 
identity of the operator, and we denote the propensity for 
dissociation of O, -bound repressor by ko r Next, we consider the 
kinetics of looping. The looped state 0\ RO2 can be formed from 
either 0\ R or O2R, but both pathways have the same propensity 
because they are driven by the same local concentration effect 
[26]. Thus, we denote the propensity for formation of 0\R02 
from 0\ R or O2R by the same symbol, ko l o 2 - Similarly, we 
denote the propensity for formation of 0\-R-Oi from 0\ R or 
O^ R by the same symbol, /co,o 3 - Finally, we let Vq,o\ denote the 
propensities for mRNA synthesis and degradation, and V\,d\ 
denote the propensities for protein synthesis and dilution. 

Equations 

We take a master equation approach to describe the system, our 
state variables being the number of mRNAs, m, the number of 
proteins, n, and the six states of the operon shown in Figure 1 . We 
let p s m n denote the probability of m mRNAs and n proteins when 
the operon is in state s. Here, s=f when the operon is free, and 
s = i or s = ij when the operon is repressor-bound, where ij are 
integers identifying the operator(s) to which the repressor is bound 
(e.g., s=l denotes the state OyR and .v= 12 denotes the state 
0\-R-02)- Then the master equations for the kinetic scheme in 
Figure 1 are 



^ = [ko lP Z + k a N!/ m n - [k 0l + k 0l 0l ) P l h „] 
+ v o(pl-i,„-pl,„)+vim(p 2 m ,„_ l -p 2 m ^ 
+d 0 [{m+l)p 2 m+ln -mp 2 m ^ 

+ d i[( n + l )P 2 m , n +i- n P 2 m ,n]> 



d Pm,n 

dt 



\^o x p x l„ + k a N/ m „ - (ko 3 + ko x o 3 )pl,, n 
+ v im(p^„_ , -Pi,,) + do [(w + l)pl + ln -mp 3 m ^ ( 4 ) 
+ d l \ i (n+l)pl„ + l -npl l ^, 



dpZ, 
dt 



[k 0l o 2 (p l m , n +P 2 m , n ) - (fco, +ko 2 )p 12 ,„ 
+ nm{p x l n _ x -p 12 , n ) +(k [(m+ l)p% +ln -mp l2 n ] ( 5 ) 
+ d l [(n+l)p l2 n+i -np l2 in ], 



¥ m ,„ 

dt 



[ k oy„,,„ + ko 2 p 2 m , n + k 0j pl, n - IkaNpf^ 

+ V o(dn-ln-P f ,,,, n )+ V i m (p f l n,n-l-P f m,n) 



+d 0 [(m+l)/ m+ln -m/ myn 



(1) 



d Pm,n 

dt 



[ k o 2 Pm,n + k o 3 pll„ + k a Np f mn -(k 0l +k 0l o 2 +k 0l o 3 )/4„] 
+ vi f n(p 1 mn _ l -p l m ^ +d 0 [(m+ l)pl +l ,„ -mpl, n ] 
+di [(n+l)pl hn+l -npl^, 



(2) 



dpZ, 
dt 



[^o,o 3 (pl,,„ +pI,„) ~{ko x +ko 3 )Pn, n 

+ [( m + l )Pm+l,n-™P",n] (6) 

+di[(n+l)p% n+1 -np£ n ~\. 



Our goal is to derive the steady state protein distribution 
corresponding to these equations. 

Parameter values 

Table 1 shows the parameter values in the absence of the 
inducer. The parameters d$ and d\ reflect the experimental values 
measured by Yu et al. [10]. The parameter vi was chosen such 
that the the mean burst size, b = V\j a\, agreed with the measured 
value b = 4, reported by Yu et al. The parameter vo was estimated 
by assuming that the mean burst frequency of fully induced cells, 
a = vo/di, is 600. The rationale for this assumption is as follows. 
An uninduced cell contains, on average, 0.5 molecules of the 
tetrameric LacZ [9] , and hence, is expected to contain 2 molecules 
of the monomeric LacY. Since the number of LacY and LacZ 
molecules increases ~ 1200-fold in fully induced cells [25], there 
are 2400 LacY molecules in such cells, i.e., ai = 2400, which 
implies that a = 600. All other parameter values were estimated 
using the method of Vilar & Leibler [15]. They estimated all the 
equilibrium constants using the repression data of Oehler et al. 
[26]. Then, given an experimental estimate of any one parameter, 
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Table 1. Parameter values 


in the absence of inducer. 








Parameter 


Value (in s ') 


Parameter 


Value (in s" 1 ) 


<k 


0.011 


ko. 


0.0016 


d, 


0.0002 


koi 


0.019 


1'0 


0.12 


ko, 


0.73 


Vl 


0.044 


ko,o 2 


4 


k a N 


0.07 


koiOi 


24 
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they could find all other parameter values. They took that one 
parameter to be the dissociation rate constant, and assigned 
to it the value obtained from in vitro data [27]. Based on this 
procedure, the association rate, k a N , was found to be 0.73 s . 
However, recent in vivo measurement show that the association 
rate for a dimeric repressor is 0.014 s [28]. If the dimeric and 
tetrameric repressor associate at the same rate, and each cell 
contains 10 repressors [29], the estimated value of k a N from these 
measurements is 0.14 s . We assumed k a N = 0.07 s _1 , and 
chose fco,, ko 1 , ko } , ko l o 2 > ko\Oi to ensure consistency with the 
repression data. As we show later, these parameter values yield 
good fits of the experimental data. 

Since we are also concerned with protein distributions in the 
presence of the inducer, it is necessary to identify the parameters 
that change under these conditions. We assume that Vcb do, Vi, and 
d\ are independent of the inducer level. The propensities for 
looping, ko l o 2 >&o, o, , ar e also unlikely to change in the presence of 
small inducer concentrations because a partially dissociated 
repressor has too littie time to interact with the inducer: In the 
presence of 10 fiM IPTG (considered equivalent to 100 fiM 
TMG), the pseudo-first-order rate constant for repressor-inducer 
binding is 0.1 S [30], which is negligible compared to the 
looping rate constant of 4 s _ 1 . Thus, the only parameters that can 
change with the inducer concentration are the association rate, 
k a N, and the dissociation rates, ko r Based on the analysis of their 
experimental protein distributions, Choi et al. concluded that the 
dissociation rates are independent of the inducer concentration, 
while the association rate decreases with the inducer concentra- 
tion. We shall also assume that this is the case. This assumption 
holds only if the concentration of TMG is significandy below 
1 mM [31,32], a condition satisfied by all the concentrations used 
by Choi et al., except possibly the highest concentration of 
200 pM. 



P m,n Pmji ~^~Pm,n ~^~Pm,n ~^~Pm,n ~^Pm,n^ 



(7) 



which represents the probability of m mRNAs and n proteins 
when the operon is repressor-bound. We then apply the quasi- 
steady state approximation to the fast variables, p„\ n , P„ n , P„m 
p„ n , an d find that the probabilities of the equilibrated bound 
states are given by the relations 



Pm,n ' 



p 12 f 
t m,n 



ko l o 2 



ko, o 2 /ko 2 



ko, o 2 /ko 2 + ko, o 3 / ko-i 



ko l o 3 /ko 3 
ko l o 2 /ko 2 +ko l o J /ko, 



I ni.iv 



I ni.iv 



2 \p u ' 



1 1 12 , 

I mjt 



ko l o 2 /ko 2 +ko l o 3 /ko 3 



ko,/ko 2 



ko x o 2 /ko 2 +ko l o 3 /ko-, 



(8) 



(9) 



Pin, (10) 



Model reduction 

The determination of the steady state protein distribution 
corresponding to eqs. (l)-(6) is facilitated by the fact that loop 
formation and mRNA degradation are relatively fast. 

Rapid loop formation. Table 1 shows that in the absence of 
the inducer, /fOi0 2 >^Oi0 3 are much greater than all other 
propensities, and as explained above, this persists even in the 
presence of low inducer concentrations. It follows that the 
repressor-bound states rapidly equilibrate on the fast time scale 

J, after which there are relatively infrequent 

transitions between the repressor-free and repressor-bound states. 
To capture this physical fact, we replace eq. (2) with the equation 
for the slow variable 



P 3 t 

rmJl 



ko x o 3 



1 1 13 , 



k 0] /ko 3 



ko x o 2 /ko 2 +ko 1 o 3 jko-i 



Pin, (12) 



which express the physical fact that after the bound states reach 
quasi-equilibrium, they obey the principle of detailed balance and 
are almost always in one of the looped states (Table 2). Moreover, 
the slow variables follow the equations 
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Table 2. Magnitudes of important derived parameters in the absence of the inducer. 



Parameter 


Value 


Parameter 


Value 


pIIJpI, 


0.86 


ko 


2xl0" 5 s" 1 




0.14 


fci 


0.22 s _1 


/>,'„,„/<„ 


4xl0 -3 


X 


3x 10" 4 




3x 10- 4 


a 


600 


pIJA,„ 


9xl0" 6 


b 


4 
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Mn,n 

dt 



k 0P l m,„- k lPi,, n ) + >'0 (/„-!,„ -iLf) 

+ nm(p i mn _ l -pf mn ) +d 0 [(w+ \y m+u - v„,,,J ( 13 ) 

+di[{n+l)/ m _ H+1 -n/ mtn ], 



dt 



+ V l m (pl,n-1-P b m,n) 

+d 0 [(m+l)p b n+ltn -mp b m<n j 
+di[(n+ 1)<„ +1 -np b m ^], 



where 



P 2 



- = '_>«Jl ^ k °\ l k °2 

" Pi,n ~ k O X 0 2 l k 0 2 + k O l 0 3 / k 0 3 ' 



1 j ( Pm,n 1 . ; ( Pm,n 

k °= k Ol [-JT )+ k o 2 l-j- 



i m,n 



i 3k Q , 



Pm,n 



i3k a N. 



(14) 



(15) 



(16) 



(17) 



Equations (13)— (14) describe the evolution of the reduced model 
containing only two operon states — the free and the equilibrated 
bound states — between which are transitions with propensities, 
ko,k\, which are slow compared to the propensities for looping 
(Table 2). This is highlighted in Figure 1 by enclosing the free and 
bound states in dashed boxes, and drawing dashed arrows with 
labels, kfj and k\, to denote the transitions between them. The 



reduced model is similar to Shahrezaei & Swain's three-stage 
model for a regulated promoter [22], but there is an important 
difference. Both operon states are transcriptionally active: The 
transcription rates in the free and bound states are Vo and Vo^, 
respectively, where X is the probability of the O2 R state. Even 
though X«l (Table 2), we cannot neglect the transcription from 
the bound state, since it captures the effect of the small 
transcriptional bursts, which can account, as we show later, for 
almost 80% of the mRNAs synthesized per cell cycle. 

Table 2 shows that in the absence of the inducer, ko«k\, so 
that the free state occurs infrequendy and lasts for very short 
periods of time, i.e., p b n „~ 1. We shall show later that this persists 
in the presence of the low inducer concentrations (< 200 p:M 
TMG) used by Choi et al. Hence, under the experimental 
conditions of interest, the conditional probabilities in (8) — (12) are 
essentially equal to the absolute probabilities. 

Rapid mRNA degradation. The second approximation 
appeals to the fact that mRNA degradation is rapid compared 
to protein dilution, i.e., d$»d\. To apply this approximation, we 
follow Shahrezaei & Swain [22]. Thus, we begin by rescaling time 
with respect to the time scale for protein degradation. Letting 
x = d\t transforms the reduced equations to the form 



d Pm,n ( b f \ , ( J f \ 

- K 0 p„, „ - K lP > m n ) +a[p' m _ l n -p'„, „) 

-ybm{p f m , n -x -P* m ,n) +y[{m+ lW„+i,„- V„ VI ] ( 18 ) 



dx 



[( n + i )pL,n + \- n P',„, n 



dpj, n 
dx 



ki/4„ - K 0 pi ,„) + dk [p h m _ Ul -p b mn 
+Ybm(p h mn _ l -p h „^ +y[(m+ \)p b m +1 „ ■ 



■ m Pi,n\ 



(19) 



where Ko = k§fd\ and K\ = k\jd\ are the frequencies of transitions 
between the free and bound operator states, a = vo/d\ is the 
frequency of unregulated transcription (in the absence of the 
repressor), b = Vi/do is the translational burst size, i.e., the average 
number of proteins produced per mRNA, and y = do/di»l is the 
ratio of protein and mRNA lifetimes. Next, we define the 



generating functions, ff(z,z',t)= Emj, 2 "" 2 "^ 



and 
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f b (z,z',t)= J2m,n z ' m2:n Pm,n> t0 obtain the partial differential 
equations 



^ - 7 [iv(l+«)-K]^+v^ = { Ko f h - Kl f) +auf, (20) 



8f b t)f b 8f b 

^ -y[bv(l + «)-«] +v^- = (k,/ - Ko /*) +flW', (21) 

where a = z' — 1 and v = z — 1 . Since y » 1 , we have the quasi- 
steady state approximation, bv(l+u) — ux0. The steady state 
protein distribution is therefore given by the equations 



binomial and negative hypergeometric distributions, which 
reflects, as we show below, the existence of two sub-populations 
of proteins, namely those derived from small and large transcrip- 
tional bursts. 

Results 

Analytical expressions for the statistics of the protein 
distributions 

Strain with auxiliary operators. The generating function 
(27) yields the following expressions for the mean, fi, and variance, 
a 2 , of the protein distribution 



fi = a r b, a r = a[ X + 



K 0 



KO + Kl 



(29) 



v^ = ( Ko /"- Kl /0+« T ^/, 



dv 



bv 



(22) 



(23) 



Since we are interested in the generating function, 
f(v) =ff(y) + f b (v), it is convenient to rewrite these equations as 



d£_ 
dv 



bv 
\-b~v 



-(jCO + Kl) 



(24) 



a 2 = fi{\+b) + [ab{\-X)Y 



K 0 Kl 



(JCO + Kl) Oo + Kl + 1) 



(30) 



Since b represents the mean number of proteins synthesized per 
mRNA, (29) implies that a r is the mean frequency of regulated 
transcription. The two terms of a r also have simple physical 
interpretations: Since X and jco/(ko + Ki) are the probabilities of 
the OrR and free states, aX and okq/(kq + ki) represent the mean 
number of mRNAs produced per cell cycle due to small and large 
transcriptional bursts. 

Expanding /(z) about z = 0 yields the steady state protein 
distribution 



df . . bv „f „ bv , 

which reduce to the second-order differential equation 



Pn = 

b\\+by 



\ ( n 



1 T(aX + n-j)T(a.+j-\)T(fi+j-\) 



r(Ko + Kl) 

r(Ko + Ki +j) 



.2-Pi(a+7-l,jS+7-l,Ko + Ki+7; -b). 



d 2 f /ko + ki l+a + aX\df 
dv 2 \ v v—l/b J dv 



+ - 



l/b 



kq + kiX aX . i 
v v-\/b) J 



(26) 



We solve this equation with the initial condition, f(0) = 1 , and 
revert to z as the independent variable, to obtain the following 
generating function for the steady state protein distribution 

f(z) = [l-b(z-r)]- al - 2 F l [oi,fiK 0 + K l ;b(z-l)], (27) 
where 2^*1 denotes the Gaussian hypergeometric function and 



1 

2 



a(l-X) + (K 0 + K 1 )±^{a(l-X) + (K 0 + K l )} 2 -4a(l- 



X)k 0 



(28) 



As expected, if X = 0, (27) reduces to the generating function of the 
negative hypergeometric distribution [22]. In general, however, 
(27) is the generating function for a mixture of the negative 



Figure 3 shows that the protein distributions obtained from this 
expression agree well with those obtained by simulating the full 
model with the Optimized Direct Method implementation of 
Gillespie's Stochastic Simulation Algorithm [33] provided in the 
simulation package StochKit2 [34]. The protein distribution in the 
absence of the inducer, shown in Fig. 3a, was obtained with the 
parameter values in Table 1 . The distributions in the presence of 
the inducer were obtained by decreasing the association rate, k a N, 
10-fold (Fig. 3b) and 20-fold (Fig. 3c). Evidendy, (31) is a good 
approximation to the exact solutions in all three cases. We 
conclude that our approximate solution is accurate down to a 20- 
fold reduction of the association rate. 

Table 2 shows that in the absence of the inducer, X,ko /k\ « 1 . 
These relations remain valid at the relatively low inducer levels 
studied by Choi et al. (<200 TMG). Indeed, under these 
conditions, the operon is expressed to no more than 1 % of the fully 
induced level [12], i.e., 



*=^+_!*_^0.01=»^-^?-^0.01=»X,^«l, (32) 

a Ko + Kl Ko + K] K\ 



and (29)-(30) can be rewritten as 
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Figure 3. Despite a 20-fold change in the repressor association 
rate, k a N, the protein distributions derived from the analytical 
expression (31) (grey squares) are in good agreement with 
those obtained from stochastic simulations of the model (black 
disks), (a) Parameter values in Table 1. (b) k a N is 1/1 Oth of the value in 
Table 1; other parameter values as in Table 1. (c) k a N is 1/20th of the 
value in Table 1; other parameter values as in Table 1. 
doi:1 0.1 371 /journal.pone.01 02580.g003 



ii = a r b, a r xa[ X+ — ), 



(33) 



ff 2 * l l(l+b) + 



a 2 b 2 Ko 



It is worth noting that due to rapid loop formation, small 
transcriptional bursts are very bursty (pulsatile). Moreover, under 
the weakly inducing conditions used in the experiments 
(< 200 nM TMG), k a N is relatively large, and hence, the large 
transcriptional bursts are also quite bursty. It follows that under 
these conditions, (33)-(34) should be expressible in terms of the size 
and frequency of the small and large transcriptional bursts. We 
shall show below that this is indeed the case. 

Strain without auxiliary operators. In the absence of 
auxiliary operators, the operon fluctuates between the free and the 
0\ -bound state, and only the former allows transcription. This is 
identical to Shahrezaei & Swain's 3-stage model of a regulated 
promoter [22], and corresponds to the special case, X = Q, 
ko = ko t , k\ = k a N of our model. It follows that the generating 
function for the steady state protein distribution is the Gaussian 
hypergeometric function 



f(z) = 2 F l [a,p,K 0 + ici;b(z-l)] 



where 



and 



K 0 = 



<to x 



k a N 
d\ ' 



(35) 



(36) 



a + K 0 + Ki±\J (a + K 0 + Ki) 2 — 4ajc 0 . (37) 
Moreover, the protein distribution is given by the expression 
r(a + rc)r(/? + w)r(K 0 + Ki) ( b \ n ( . b 



r(n+l)r(a)r(^)r(K 0 + Ki+n) V+b 



Pn- 



and the mean and variance are 



l+b 



x 2 Fi cc + n,Ko + K\ —fi,Ko + Ki + n; 



l+b ' 



(38) 



\i = a,b, a r 



a 2 = f i{\+b) + a 2 b 2 



K 0 



K 0 + K\ 



(Ko + Ki) (fCo + Kl + 1) 



(39) 



(40) 



At TMG concentrations of < 100 ,uM, which are equivalent to 
an IPTG concentration of < 10 fiM, the operon is expressed to 
no more than 5% of the fully induced level [35]. It follows that 
under the experimental conditions of interest 



^< 0 .05^< ^*0.05, 



(41) 



(34) and [l,a 2 can be approximated by the expressions 



a = a r b, a r xa*^- , (42) 

Kl 
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2 b 2 K 0 



(43) 



Expressing the statistics in terms of the burst size and 
frequency 

Choi et al. assumed that the quantities F = a 2 /ji and 
J7~ 2 = (<j//i) 2 represent the size and frequency of small 
transcriptional bursts, and F = o 2 /fi and fj~ 2 = (a/p,)^ 2 represent 
the size and frequency of large transcriptional bursts. To check the 
validity of these assumptions, we shall express (33)-(34) and (42)- 
(43) in terms of the size and frequency of the transcriptional bursts. 
Given these expressions, we can immediately infer the dependence 
of F ' ,r\~ 2 ,F \r\~ 2 on the size and frequency of the transcriptional 
bursts, and then compare them to the assumptions made by Choi 
et al. 

Strain with auxiliary operators. To express fi = a r b in 
terms of the size and frequency of the transcriptional bursts, we 
begin by recalling that a r consists of two terms, dk and flKo / K, 
which represent the mean frequency of transcription due to partial 
and complete dissociations of the repressor, respectively. Since 
partial dissociations occur when a repressor trapped in the 0\ Oi - 
loop dissociates from 0\, we define the number of the partial 
dissociations per cell cycle as 



repressor to an operator is on the order of k l 1 . Evidendy 



a r b r xa — xaX 



co 2 



k\ k a N ' 



and we conclude that 



a r »a( X+ — ) = a„b„ + a c b c 
V K \) 

Hence, (33)-(34) can be rewritten as 

fi = a,b « (a p b p + a c b c ) b, 



o 2 Kii(\+b) + a c (b c bY, 



which imply that 



a 2 



A+b+f c (b c b), 



(49) 



(50) 



(51) 



(52) 



(53) 



_Pm,n k O l _Pm,„ko 1 0 2 ^k 0l 0 2 



(44) 



where we have appealed to the detailed balance between the 
operon states 0 2 R and 0\R0 2 . We also define the number of 
mRNAs synthesized per partial dissociation as 



>'() 



C °l°2 



(45) 



since the time for rebinding of a partially dissociated repressor to 
0\ is on the order of ^Oj'oj • ^ follows from these definitions that 



dpbpf 



(46) 



i.e., we have successfully expressed the first term of a r in terms of 
frequency and mRNA burst size due to partial dissociations. We 
now proceed to express the second term of a r in terms of the 
frequency and mRNA burst size due to complete dissociations. 
Since complete dissociations occur whenever the operon becomes 
repressor-free, it is natural to define the number of complete 
dissociations per cell cycle as 



where 



0' 



fc 



F 



(a p b p + a c b c )b 
l+b+f c b c b 



{a c bclfc)b 
l+b+f c b c b' 



a c b c 



ko 2 /k a N 



a p b p + a c b c l+ko 2 /k a N' 



(54) 



(55) 



is the fraction of proteins derived from complete dissociations. It 
follows from (53) that the total burstiness, F, is entirely due to 
translational and large transcriptional bursts. Moreover, the 
burstiness of large transcriptional bursts depends on their intrinsic 
burstiness, b c b, suitably weighted by f c , the fraction of proteins 
derived from such bursts. Importandy, f c is completely determined 
by ko 2 /k a N, the equilibrium constant for dissociation of the 
repressor from 0 2 . In the absence of the inducer, this equilibrium 
constant is 0.25 [25,26], and hence, f c = 0.2, i.e., 20% of the 
proteins are derived from large transcriptional bursts. As the 
inducer concentration increases, f c increases because k a N 
decreases. 

Strain without auxiliary operators. In this case, if we 
define the number of complete dissociations per cell cycle as 



Vkkr, 



(47) 



We also define the number of mRNAs synthesized per complete 
dissociation as 



b c -= 



v 0 



vo 



(48) 



ki 3k a N' 

because the time for rebinding of a completely dissociated 



(56) 



and the number of mRNAs synthesized per complete dissociation 

as 



the mean frequency of regulated transcription can be rewritten as 
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K 0 _ r 
a,xa— =a c b c . 
K\ 

It follows that (42)— (43) can be rewritten as 
jl = a r bxa c b c b, 



(58) 



(59) 



a z = Jl(l+b)+a c b 2 c b 2 



(60) 



a 
o 



which imply that 



F= — 



■ l+b+b c b, 



(61) 



a. 
E 
o 
u 



n- 2 - 



F 



a,b 

-b + brb 



a c b c b 
+ b + b r b' 



(62) 



We are now ready to address questions concerning the physical 
meaning of the parameters of the distribution and their variation 
with inducer concentration [12]. 



Discussion 

Interpretation of the protein distribution data 

Strain with auxiliary operators. Interpretation of F and 
tj~ 2 derived from filtered data. Choi et al. assumed that F 
and r\~ 2 derived from the filtered data (Fig. 2c) represent the size 
and frequency of small transcriptional bursts. In terms of our 
model, these assumptions have the form 



F^b p b, 



(63) 



Q- 

O 

>> 



g 



a. 
£ 
o 
u 



(64) 



However, (53)-(54) imply that this F and »; -2 , obtained by 
eliminating the contribution of the large transcriptional bursts, 
have a different physical meaning. Indeed, (53) implies that the 
Fano factor obtained from the filtered data has the form, 
F=\-\-b, which represents the size of the translational, rather 
than small transcriptional, bursts. Similarly, (54) implies that the 
reciprocal of the noise derived from the filtered data has the form, 
r\~ 2 = a p b p b/(\ +b), which is proportional to a p b p , the average 
number of mRNAs derived from small bursts, rather than the 
frequency of the small bursts. Since r]~ 2 x 1 (Fig. 2c) and bx4, our 
interpretation of the filtered data implies that a p b p K 1.25, which is 
close to the estimate obtained from the model (Table 3). 

Evidently, there is a discrepancy between the assumptions of 
Choi et al. and the implications of our model. To understand its 
origin, observe that their assumptions are equivalent to the 
relations 



-a 



3 



a- 



M 
91 

SI 

(S 



■c 

c w 
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H = Frj 2 xa p b p b, (65) 



<j 1 = F f ixa p (b p b) 2 , (66) 

i.e., they assumed, in effect, that both the mean and the variance 
are dominated by contributions from small transcriptional bursts. 
In contrast, (51)— (52) show that small bursts contribute to the 
mean, but not to the variance. This difference arises because we 
assumed that looping is so fast that the rapid fluctuations due to 
partial dissociations are averaged out on the slow time scale of the 
other processes. This averaging process preserves the contribution 
of small transcriptional bursts to the mean, but eliminates their 
contribution to the variance. 

The assumption F xb p b appears to be implausible. Indeed, (53) 
implies that translational bursts contribute the term b to the Fano 
factor. For the small bursts to make a significant, let alone 
dominant, contribution to the Fano factor, it is clear that b p ~ 1, 
i.e., on average, approximately one mRNA must be synthesized 
per partial dissociation. However, looping is so fast compared to 
transcription that b p = vo/ko l o 2 xO.03 in the absence of the 
inducer (Table 3). Moreover, b p is unlikely to change even in the 
presence of the inducer since vrj an d kot o 2 are constant over the 
range of inducer concentrations used in the experiments. We 
conclude that the bursts due to partial dissociations are so small 
that they cannot be the dominant source of burstiness. 

Interpretation of F and >j 2 derived from raw 
data. Choi et al. rejected the raw data shown in Fig. 2b since 
the occurrence of large bursts in a few cells distorted the statistics 
of the small bursts. We show below that these data are a valuable 
source of information about the statistics of large bursts. 
Specifically, (53)— (54) predict the observed variation of F and 
v\ ~ 2 derived from the raw data, and thus provide a method for 
estimating not only the size and frequency of the large 
transcriptional bursts, but also the fraction of proteins derived 
from them. This method is particularly useful because, as we show 
below, there are simple relationships between the size and 
frequency of the large bursts in strains SX701 and SX703, but 
they are not identical. 

The analysis of the raw data shows that the total burstiness, F, 
increases with inducer concentration (Fig. 2b). Eq. (53) implies that 
this is due to the growing burstiness of the large transcriptional 
bursts: Since both b c and f c increase with inducer level, so does 
f c b c b. This increase occurs so rapidly that at 100 fiM TMG, large 
trancriptional bursts become the dominant source of burstiness, 
i.e, Fxf c b c b. Indeed, assuming bx4, (53) implies Fxf c b c b 
whenever F»5. Inspection of Fig. 2b shows that at 100 fiM 
TMG, Fx25, and hence, Fxf c b c b. We shall show below that at 
such inducer levels, fx 1 and Fxb c b. 

In contrast to the total burstiness, F, the reciprocal of the total 
noise, V]~ 2 = \ij F , decreases with inducer concentration until it 
reaches a constant value (Fig. 2b). The model suggests that this is 
because both fi and F increase with inducer level, but F increases 
faster than fi: Indeed, both b c and f c increase with inducer level, 
and Eq. (54) shows that fi is proportional to the ratio b c /f c , 
whereas F increases with the product fb c . The decreasing trend of 
rj~ 2 continues until the inducer levels become so high that large 
bursts account for all the proteins (f c ~\) and burstiness (F xb c b). 
Under these conditions r]~ 2 approaches a c , the frequency of large 



bursts, which is independent of inducer concentration. Compar- 
ison with the data in Fig. 2b then implies that a c x0.2. 

Given a c x0.2 and bx4, (53)-(54) provide a method for 
estimating the variation of b c b and f with inducer levels from the 
raw data for SX701. To see this, it is convenient to rewrite (53)- 
(54) in the form 

f c (b c b)=F-(l+b), (67) 



h £ = W 2 . (68) 

Jc a c 

Since the variation of F and t]~ 2 with the inducer concentration 
is known (Fig. 2b), we can solve the above equations to obtain b c b 
and f c as a function of the inducer concentration. These calculated 
profiles, shown in Fig. 2d, agree with the claims above: Both b c b 
and f c increase with the inducer level, and the latter approaches 1 
at 100 nM TMG. 

Strain without auxiliary operators. Interpretation of F 
and jj~ 2 . Choi et al. assumed that the F and Y\~ 2 shown in 
Fig. 2a represent the size and frequency of large transcriptional 
bursts, i.e., 

Fxbcb, (69) 



n- 2 xa c . (70) 

Our model implies that these relations are valid at all non-zero 
inducer concentrations used in the experiments. Indeed, since 
bx4, (61)-(62) imply that the above relations are valid whenever 
F»5, which is satisfied (F Z 25) at all the non-zero inducer 
concentrations used in the experiments (Fig. 2a). In particular, 
comparison with the data in Fig. 2a implies that a c »3. 

Relationships between the statistics of large bursts in the 
strains with and without auxiliary operators. The model 
predicts simple relationships between the size and frequency of the 
large transcriptional bursts in strains SX701 and SX703, which 
provide tests for checking the consistency of the model. Indeed, it 
follows from (48) and (57) that b c /b c = l/3, a relationship that is 
also mirrored by the data (compare full and dashed lines in Fig. 2d). 
Similarly, (47) and (56) imply that 

a c m ' n ko l o 2 /l <: 02+k 0l o 3 /ko 3 

a ratio estimated to be 1/80 based on the values in Table 1, which 
is of the same order of magnitude as the value 1/15, obtained from 
the experimentally determined values of a c x0.2 and a c x3. 

Condition for the negative binomial distribution 

Choi et al. assumed that the protein distributions of both strains 
follow the Gamma distribution, the continuous analog of the 
negative binomial distribution. We have shown above that neither 
one of the strains follows the negative binomial distribution. Here, 
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we demonstrate that the distributions can reduce to the negative 
binomial distribution, but only only if the large burst size is 
negligibly small, i.e., the association rate k a N, is much larger than 
the transcription rate Vo. Under this condition, even the large 
bursts are averaged out, and they contribute to the mean, but not 
the variance or the burstiness. 

We begin by considering the strain without auxiliary operators. 
Under the weakly induced conditions used in the experiments, 
Ko«Ki, and the generating function for the protein distribution is 
the negative hypergeometric function 



f{z)K 2 F,[a;fS,K U b{z-\j\ es £^p{/>(z-l)}*, (72) 

k=0 ^ Kl >k 

which reduces to the generating function for the negative binomial 
distribution precisely when a = K\ or /? = K\ . Now (37) implies that 



1.0 



a«a + Ki xki (b c + 1) 



— — K=——,a r = a— =a c b c 
a + Ki b c + \ K\ 



(73) 



(74) 



The condition ft = K\ can never be satisfied since K\/kq»\. 
However, a«fCi precisely when b c «l, in which case fixa r and 



/(z)«^(«,.) A {i(z-i)}Mi-6(z-i)rr (75) 

which is the generating function for the negative binomial 
distribution 



Pn = 



T(a r +ri) 



r(»+i)r(a,) \i+bj V i+* 

It is worth noting that under this condition 



(76) 



JX = a r b,a 2 = Ji(l+b)^F=l+b,t] 2 =a r j—^, (77) 

i.e., large transcriptional bursts make no contribution to the 
burstiness. 

A similar argument shows that the generating function for the 
strain with auxiliary operators reduces to 



f(z) = [l-b(z-l)]- a '\ a r = a(X+ 



(78) 



precisely when b c « 1 . Under this condition, the proteins follow the 
negative binomial distribution 



Pn = 



r(a r + n) 

r(»+i)r(«,) \i+b 



l+b 



(79) 



and 
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Figure 4. Protein distribution data for strain SX703 (full circles) 
at various TMG concentrations fitted with the Gamma 
distribution by Choi et al. (dashed curve) and the negative 
hypergeometric distribution (full curve). The negative hypergeo- 
metric distribution was fitted with the parameter values in Table 1, 
except k a N, which was decreased with increasing inducer concentra- 
tion, (a) Data obtained at 50 fM TMG fitted with k a N = Q.02i s -1 . (b) 
Data obtained at 100/iM TMG fitted with k a N = 0.Q\Z s _1 . (c) Data 
obtained at 200 fM TMG fitted with k a N = 0.0054 s" 1 . 
doi:10.1 371/journal.pone.01 02580.g004 




fi = a r b, a 2 = fi(l+b)^>F =\+bxb, rj 2 =a r 



l+b' 



(80) 



i.e., even the large transcriptional bursts do not contribute to the 
burstiness. 

We have shown above that the proteins follow the negative 
binomial distribution only if the large bursts are, in fact, rather 
small, and hence, do not contribute to the burstiness. But it follows 
from the data in Figs. 2a,b that these bursts do contribute 
significandy to the burstiness of strains SX70 1 and SX703 — if 
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this was not true, (77) and (80) imply that the burstiness would be 
independent of inducer concentration, which contradicts the data. 
The negative binomial distribution is therefore unlikely to provide 
good fits to the raw data for both strains, but will fit the filtered 
data well, since the contribution of large bursts has been 
eliminated from it. The fits in Choi et al. are consistent with this 
conclusion. The Gamma distribution fits the filtered data for strain 
SX701 rather well. However, this is less so for the protein 
distributions obtained with strain SX703, which exhibits only large 
bursts. Figure 4 shows that better fits are obtained with the 
negative hypergeometric distribution (38). 

Conclusions 

We formulated and solved a stochastic model of lac expression 
accounting for auxiliary operators and DNA looping. Based on a 
comparison of our expressions for the Fano factor, noise, and 
protein distribution of strains SX701 (with auxiliary operators) and 
SX703 (without auxiliary operators) with those proposed by Choi 
et al., we arrive at the following conclusions: 

1 . The physical interpretations of the Fano factor F and 
reciprocal noise rj~ 2 for strain SX703 are identical to those 
proposed by Choi et al., namely F and f\~ 2 represent the size 
and frequency of (large) transcriptional bursts. 

2. The physical interpretations of the Fano factor F and 
reciprocal noise r\~ 2 derived from the filtered data for 
SX701 differ from those given by Choi et al., namely F and 
r\~ 2 represent the size and frequency of small transcriptional 
bursts. Instead, we find that F represents the size of 
translational bursts, and f/~ 2 is proportional to the mean 
number of mRNAs derived from small transcriptional bursts. 
Our interpretation is different because we assume that looping 
is so fast that fluctuations due to small transcriptional bursts are 
averaged out — small bursts therefore contribute to the mean, 
but not the burstiness, of the protein distribution. This has two 
consequences: 

(a) The information lost due to the averaging implies that the 
small burst size and frequency cannot be separately 
extracted from the data. At best, we can only determine 
the product of the small burst size and frequency, which 
represents the mean number of mRNAs derived from small 
bursts. 

(b) The burstiness is entirely due to translational and large 
transcriptional bursts. In particular, the burst size derived 
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activity in individual bacteria. Cell 123: 1025-36. 

9. Cai L, Friedman N, Xie XS (2006) Stochastic protein expression in individual 
cells at the single molecule level. Nature 440: 358-62. 

10. YuJ, Xiao J, Ren X, Lao K, Xie XS (2006) Probing gene expression in live cells, 
one protein molecule at a time. Science (New York, NY) 311: 1600-3. 



from the filtered data for strain SX701, from which the 
contribution of the large bursts has been deliberately 
eliminated, yields the size of translational, rather than small 
transcriptional, bursts. 

3. Choi et al. did not consider the raw data for SX701 because 
large bursts, although rare, contributed significantly to protein 
synthesis. This is consistent with our model: Even in uninduced 
cells, 20% of the proteins are derived from large bursts. We 
find that the raw data contains valuable information about the 
statistics of large bursts. By analyzing this data with our model, 
we isolate not only the size and frequency of large bursts, but 
also the fraction of proteins derived from them. The large burst 
size obtained in this manner is consistent with another 
prediction of the model, namely, it is one-third of the (large) 
burst size in strain SX703. The model also predicts that the 
fraction of proteins derived from large bursts is completely 
determined by a measurable quantity, namely the dissociation 
constant for binding of the repressor to the auxiliary operator 

o 2 . 

4. The protein distributions for both strains are not negative 
binomial: SX703 follows a negative hypergeometric distribu- 
tion, and SX701 follows a mixture of the negative binomial and 
negative hypergeometric distributions that reflects the existence 
of two sub-populations of proteins, namely, those derived from 
small and large bursts. Negative binomial distributions are 
attained only if large bursts are insignificant, a condition that 
holds only if the data are filtered by eliminating the 
contribution of such bursts. 

These results imply that interpretation of the steady state 
protein distributions depends crucially on the details of the 
regulatory mechanisms. 
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